Noisy-OR Component Analysis and its Application to Link Analysis
نویسندگان
چکیده
We develop a new component analysis framework, the Noisy-Or Component Analyzer (NOCA), that targets high-dimensional binary data. NOCA is a probabilistic latent variable model that assumes the expression of observed high-dimensional binary data is driven by a small number of hidden binary sources combined via noisy-or units. The component analysis procedure is equivalent to learning of NOCA parameters. Since the classical EM formulation of the NOCA learning problem is intractable, we develop its variational approximation. We test the NOCA framework on two problems: (1) a synthetic image-decomposition problem and (2) a co-citation data analysis problem for thousands of CiteSeer documents. We demonstrate good performance of the new model on both problems. In addition, we contrast the model to two mixture-based latent-factor models: the probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA). Differing assumptions underlying these models cause them to discover different types of structure in co-citation data, thus illustrating the benefit of NOCA in building our understanding of highdimensional datasets.
منابع مشابه
Improving the Performance of ICA Algorithm for fMRI Simulated Data Analysis Using Temporal and Spatial Filters in the Preprocessing Phase
Introduction: The accuracy of analyzing Functional MRI (fMRI) data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI data having high noise is to use suitable preprocessing methods with the aim of data denoising. Some effects of preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evalua...
متن کاملFaults and fractures detection in 2D seismic data based on principal component analysis
Various approached have been introduced to extract as much as information form seismic image for any specific reservoir or geological study. Modeling of faults and fractures are among the most attracted objects for interpretation in geological study on seismic images that several strategies have been presented for this specific purpose. In this study, we have presented a modified approach of ap...
متن کاملRecognition of Multiple PQ Issues using Modified EMD and Neural Network Classifier
This paper presents a new framework based on modified EMD method for detection of single and multiple PQ issues. In modified EMD, DWT precedes traditional EMD process. This scheme makes EMD better by eliminating the mode mixing problem. This is a two step algorithm; in the first step, input PQ signal is decomposed in low and high frequency components using DWT. In the second stage, the low freq...
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملAn application of principal component analysis and logistic regression to facilitate production scheduling decision support system: an automotive industry case
Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be ineff...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 7 شماره
صفحات -
تاریخ انتشار 2006